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Abstract 

The rapid worldwide spread of the severe acute respiratory syndrome (SARS) demonstrated the potential 
threat an infectious disease poses in a closely interconnected and interdependent world. Here we introduce a 
probabilistic model which describes the worldwide spreading of infectious diseases and demonstrate that a 
forecast of the geographical spread of epidemics is indeed possible. It combines a stochastic local infection 
dynamics between individuals with stochastic transport in a worldwide network which takes into account 
the national and international civil aviation traffic. Our simulations of the SARS outbreak are in suprisingly 
good agreement with published case reports. We show that the high degree of predictability is caused by the 
strong heterogeneity of the network. Our model can be used to predict the worldwide spreading of future 
infectious diseases and to identify endangered regions in advance. The performance of different control 
strategies is analyzed and our simulations show that a quick and focused reaction is essential to inhibit the 
global spreading of epidemics. 
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I. INTRODUCTION 



The application of mathematical modeling to the spread of epidemics has a long history and 
was initiated by Daniel Bernoulli's work on the effect of cow-pox inoculation on the spread of 
smallpox in 1760| 1]. Most studies concentrate on the local temporal development of diseases and 
epidemics. Their geographical spread is less well understood, although important progress has 
been achieved in a number of case studies . The key question and difficulty is how to 

include spatial effects and to quantify the dispersal of individuals. This problem has been stud- 
ied with some effort in various ecological systems, for instance in plant dispersal by seeds jfl. 
Today's volume, speed and non-locality of human travel (Fig.[TJ and the rapid worldwide spread 
of SARS (Fig. El) demonstrate that modern epidemics cannot be accounted for by local diffusion 
models which are only applicable as long as the mean distance traveled by individuals is small 
compared to geographical extents. These local reaction-diffusion models generically lead to epi- 
demic wavefronts, which were observed for example in the geotemporal spread of the Black Death 



in Europe from 1347-50 



Here we focus on mechanisms of the worldwide spreading of infectious diseases. Our model 
consists of two parts: a local infection dynamics and the global traveling dynamics of individuals 
similar to the models investigated in Jill . However, both constituents of our model are treated on a 
stochastic level, taking full account of fluctuations of disease transmission, latency and recovery on 
one hand, and fluctuations of the geographical dispersal of individuals on the other. Furthermore 
we incorporate nearly the entire civil aviation network. 

H. LOCAL INFECTION DYNAMICS 

In the standard deterministic SIR model for infectious diseases, a population with iV individuals 
is categorized according to its infection status: susceptibles (S), infectious (/) or recovered and 
immune (i?)!^,!!^. The dynamics which specifies the flow among these categories is given by 

ds/dt = — asj, dj /dt = a s j — f3 j , (1) 

where s = S/N and j = I/N denote the relative number of susceptibles and infecteds, respec- 
tively. The relative number of recovered individuals r = R/N is obtained by conservation of the 
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Figure 1: Global aviation network. A geographical representation of the civil aviation traffic among 
the 500 largest international airports in over 100 different countries is shown. Each line represents 
a direct connection between airports. The color encodes the number of passengers per day (see 
color code at the bottom) traveling between two airports. The network accounts for more than 
95% of the international civil aviation traffic. For each pair of airports, we checked all flights 
departing from airport j and arriving at airport i. The amount of passengers carried by a specific 
flight within one week can be estimated by the size of the aircraft (We used manufacturer capacity 
information on over 150 different aircraft types) times the number of days the flight operates in one 
week. The sum of all flights yields the passengers per week, i.e. in Eq. Q. We computed the 
total passenger capacity M ij of eacn airport j per week and found very good agreement with 
independently obtained airport capacities. 
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entire population, i.e. r(t) = 1 —j(t) —s(t), and r = is the average infectious period. The key 
quantity describing the infection is the basic reproduction number p = a/ (3, which is the average 
number of secondary infections transmitted by an infectious individual in an otherwise uninfected 
population. If p > 1 and the initial relative number of susceptibles is greater than a critical value 
s c = 1/po an epidemic develops (dj/dt > 0). As the number of infected individuals increases, the 
fraction of susceptibles s decreases and thus the number of contacts of infected individuals withfig 
susceptibles decreases until s = s c when the epidemic reaches its maximum and subsequently 
decays. 

The above SIR model incorporates the underlying mechanism of transmission and recovery 
dynamics and has been able to account for experimental data in a number of cases. However, 
transmission of and recovery from an infection are intrinsically stochastic processes and the deter- 
ministic SIR model does not account for fluctuations. These fluctuations are particularly important 
at the beginning of an epidemic when the number of infecteds is very small. 

In this regime a probabilistic description must be used. Schematically the stochastic infection 
dynamics is given by 

S + 7^27, 7^0 . (2) 

The first reaction reflects the fact that an encounter of an infected individual with a susceptible 
results in two infecteds at a probability rate a, the second indicates that infecteds are removed 
(recover) at a rate (3 and effectively disappear from the population. The quantity of interest is the 
probability p(S, 7; t) of finding a number S of susceptibles and 7 infecteds in a population of size 
N at time t. Assuming that the process is Markovian on the relevant time scales, the dynamics of 
this probability is governed by the master equation 

d t p{S,I;t) = ^(S+l)(I-l)p(S + l,I-l;t)+P(I+l)p(S,I+l;t) 

- (jjSIp-pi)p{S,I;t) . (3) 

In addition to this dynamics one must specify the initial condition p( S, 7; t — to) which is typically 
assumed to be a small but fixed number of infecteds 7 , i.e. p(S, I;t = t ) = 5ij 5s,n-i - 

The relation of the probabilistic master equation © to the deterministic SIR-model (UJ) can be 
made in the limit of a large but finite population, i.e. N ^> 1. In this limit one can approximate 
the master equation by a Fokker-Planck equation by means of an expansion in terms of condi- 
tional moments (Kramers-Moyal expansion II 1 311 ^ - see the supplement material. The associated 
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description in terms of stochastic Langevin equations reads 

ds/dt = —asj-\ — t=V 'a s j£i(t) (4) 



. N 

dj/dt = as j- (3 j - -±=^a~s~]Ut) + 4=^6(0 (5) 



Here, the independent Gaussian white noise forces £i(£) and £ 2 (0 reflect the fluctuations of trans- 
mission and recovery, respectively. Note that the magnitude of the fluctuations are oc 1/ y/N and 
disappear in the limit iV — > oo in which case Eqs. CQ) are recovered. However, for large but finite 
N a crucial difference is apparent: Eqs. © and © contain fluctuating forces and N is a param- 
eter of the system. A careful analysis shows that even for very large populations (i.e. N ^> 1) 
fluctuations play a prominent role in the initial phase of an epidemic outbreak and cannot be ne- 
glected. For instance even when p > 0, a small initial number of infecteds in a population may 
no necessarily lead to an outbreak which cannot be accounted for by the deterministic model. 



III. DISPERSAL ON THE AVIATION NETWORK 



As individuals travel around the world, the disease may spread from one place to another. In 
order to quantify the traveling behavior of individuals, we have analyzed all national and interna- 
tional civil flights among the 500 largest airports by passenger capacity. This analysis yields the 
global aviation network shown in Fig. [H further details of the data collection is compiled in the 
supplement material. The strength of a connection between two airports is given by the passengers 
capacity, i.e. the number of passengers that travel this route per day. 

We incorporate the global dispersion of individuals into our model by dividing the population 
into M local urban populations labeled i containing Ni individuals. For each i the number of 
susceptibles, infecteds individuals is given by Si and Ii, respectively. In each urban area the 
infection dynamics is governed by the master equation ©. 

Stochastic dispersal of individuals is defined by a matrix %j of transition probability rates 
between populations 

Si Sj Ii Ij, i,j = 1, ...,M , (6) 

where 7^ = 0. Along the same lines as presented above one can formulate a master equation for 
the pair of vectors X = {S\, Ji, Sm, Im} which defines the stochastic state of the system. This 
master equation is provided explicitely in supplement material. 
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In order to account for the global spread of an epidemic via the aviation network one needs to 
specify the matrix 7^. Since the global exchange of individuals between urban areas is carried 
out by airborne travel one can estimate the probability rate matrix 7^ by t. We assume that an 
individual remains in urban area for some time before traveling to another region. A flight j — ► i 
is chosen according to the weights 



where M^- is the number of passengers per unit time that depart from an airport in region j and 
arrive at airport in region i. The matrix w accounts for the overall connectivity of the aviation 
network as well as for the heterogeneity in the strength of the connections. Denoting the typical 
time period individuals remain at i by Tj the matrix 7^ is expressed in terms of Wij according to 
7y = Wij/rj. If we assume that each airport is surrounded by a catchment area with a population 
Ni the typical time individuals remain at i is given by Tj = Ni/ J2j Mji. If the capacity of airport 
% reflects the need of the associated catchment area (i.e. N oc J2j Mji), the waiting times Tj are 
identical for all i, i.e. Tj = r = 7" 1 which implies 7^ = 710^-. In our model the global rate 7 
is a free parameter. In order to verify the its validity, we apply our model to the SARS outbreak. 
The rate 7 can be computed from the ratio of the number of infected individuals in Hong Kong to 
the number of infected individuals outside Hong Kong, which is provided by the WHO data. For 
the local infection dynamics we use a simple extension of the above stochastic SIR model: The 
categories S, I and R are completed by a category L of latent individuals which have been infected 
but are not infectious yet themselves, accounting for the latency of the disease. In our simulations 
individuals remain in the latent or infectious stage for periods drawn from the delay distribution 
provided in Fig. 2 in full . In our simulation we chose random infection times the distribution of 
which is known for SARS full . In a realistic simulation the basic reproduction number p cannot 
be assumed to be constant over time. Successful control measures, for instance, generally decrease 
p . We chose a time dependent p (t) as provided by Refs. 

IV. RESULTS OF SIMULATIONS 

Fig. El depicts a geographical representation of the results of our simulations. Initially, an 
infected individual was placed in Hong Kong. For this initial condition we simulated 1000 real- 
izations of the stochastic model and computed the mean value (I(t)) of the number of infecteds 




(7) 
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Figure 2: (a) Geographical representation of the global spreading of probable SARS cases on 
May 30th, 2003 as reported by the WHO and CDC. The first cases of SARS emerged in mid 
November 2002 in Guangdong Province, Chinajl7|. The disease was then carried to Hong Kong 
on the February 21st, 2003 and began spreading around the world along international air travel 
routes, as tourists and the medical doctors who treated the early cases traveled internationally. 
As the disease moved out of southern China, the first hot zones of SARS were Hong Kong, Sin- 
gapore, Hanoi (Viet Nam) and Toronto (Canada), but soon cases in Taiwan, Thailand, the United 
States, Europe and elsewhere were reported, (a) Geographical representation of the results of 
our simulations 90 days after an initial infection in Hong Kong, The simulation corresponds to the 
real SARS infection at the end of May, 2003. Since our Since the simulations cannot describe 
the infection in China, where the disease started in November 2002, we used the WHO data for 
China. 
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at each node i = 1, ...,M of the network. Since the size of catchment areas varies on many 
scales, the fluctuation range is best quantied by the means of the relative variance of z = log /, 
i.e. 7] = J(z 2 ) - {zYl < z > . In our simulations we computed this measure for every i of the 
network. Fig |21 shows the prediction of our model for the spread of S ARS at t = 90 days after 
the initial outbreak in Hong Kong (February 19, 2003), corresponding to the May 20, 2003. The 
results of our simulations are in remarkable agreement with the worldwide spreading of SARS as 
reported by the WHO (compare Fig. El): There is an almost one-to-one correspondence between 
infected countries as predicted by the simulations and the WHO data. 

Also the orders of magnitude of the numbers of infected individuals in a country agree (Ta- 
ble H|). While for most countries the reported cases by the WHO lie within the fluctuation range, 
two deviation between the reported cases and the predictions of the simulation are apparent: Our 
simulations predict a relatively high number of SARS cases in Japan (between 26.6 and 137.0). 
However, the Japanese Government reported no confirmed case (only 5 suspected cases) of SARS 
in Japan, as of May 30, 2003. How a single realization may deviate from the expectation can be 
seen from the difference between the simulation and the reported cases in the USA and Canada. 
The simulations show that on average the USA should have a higher number of SARS cases than 
Canada, although the opposite was reported by the WHO. The impact of the inherent stochasticity 
of the infection and traveling dynamics is discussed in the next section. 

V. THE IMPACT OF FLUCTUATIONS 

Bearing in mind the low number of infections and the small value of p for SARS, the high de- 
gree of predictability, i.e. the low impact of fluctuations on the network level, is rather surprising, 
especially because our simulations take into account the full spectrum of fluctuations of disease 
transmission, recovery and dispersal and that the system evolves on a highly complex network. 
Naively, one expects that dispersal fluctuations between two given populations are amplified as 
the epidemic spreads globally and that no prediction can be made. In order to clarify this impor- 
tant point, consider the system of two confined populations A and B which exchange individuals 
as depicted in Fig. El For simplicity we assume that both populations have the same size (i.e. 
Na = Nb = N) and individuals traverse at a rate 7. Now assume that initially a small number of 
infected 7 is introduced to population A without any infecteds contained in B. For a sufficiently 
high number of infecteds in A an epidemic occurs. For 7 > infecteds are introduced to B and a 
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Country 


WHO 

05/20/2003 


WHO 

05/30/2003 


Average 


Simi 

V 


lation 
Min 


Max 


Hong Kong 


1718 


1739 


1951 


0.35 


1373.9 


2770.4 


Taiwan 


383 


676 


318.2 


0.55 


184.0 


550.3 


Singapore 


206 


206 


136.6 


0.68 


69.4 


268.7 


Japan 






60.4 


0.84 


26.6 


137.0 


Canada 


140 


188 


41.8 


0.94 


16.4 


106.6 


USA 


67 


66 


65.9 


0.84 


28.4 


152.7 


Vietnam 


63 


63 


49.2 


0.86 


20.7 


116.3 


Philippines 


12 


12 


30.0 


0.97 


6.2 


50.7 


Germany 


9 


10 


14.4 


1.1 


4.8 


43.1 


Netherlands 






5.9 


1.09 


2.0 


17.6 


Bangladesh 






10 


1.15 


3.2 


31.6 


Mongolia 


9 


9 










Italy 


9 


9 


5.3 


1.02 


1.9 


14.6 


Thailand 


8 


8 


35.4 


0.89 


14.5 


86.8 


France 


7 


7 


7.6 


1.09 


2.6 


22.6 


Australia 


6 


6 


27.0 


1.05 


10.1 


72.5 


Malaysia 


7 


5 


17.7 


1.05 


6.2 


50.7 


United Kingdom 


4 


4 


16.7 


1.04 


5.9 


47.0 



Table I: A comparison of the SARS case reports provided by the WHO and the results of our 
simulation for all countries with a reported case number > 4. The expected number of infecteds 
predicted by our model is estimated by the average over 1000 realizations of the stochastic model. 
The epidemic was simulation for t = 90 days after the initial outbreak of SARS in Hong Kong 
(February 19, 2003), yielding a simulation end of May 20, 2003. The range defined by column 6 
and 7 was computed by means of the fluctuation measure 77 (see text) which is approximately one 
for all countries. 
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Figure 3: Two confined populations with exchange of individuals. In each population the dynamics 
is governed by the SIR-reaction scheme ©. Individuals travel from on population to the other at 
a rate 7. Parameters are N A = N B = 10, 000, R = 4 and an initial number of infecteds I = 20 in 
population A Left: The probability ^(7) of an outbreak occurring in population B as a function of 
transition rate 7. The insets depict histograms of the time lag T between the outbreaks in A and B 
for those realization for which an outbreak occurs in B. The circles are results of the simulations of 
100,000 realizations, the solid curve is the analytic result of Eq. {8} Right: A star-shaped network 
with a central population A connected to M - 1 populations Bi, #M-iwith rates 71, ...,7m-i- 
The cumulated variance (Eq. ??) for a star network with 32 populations is depicted as a function of 
the average transmission rate 7. Two cases are exemplified: equal rates (circles) and distributed 
rates according to Eq.[T0]with 7 max /7mm ~ 1000 (squares). The solid lines show the analytical 
results given by Eq.|9]and Eq. |8] Parameters are N A = N B = 10,000, R = 4 and an initial 
number of infecteds I = 20 in population A. The numerical values are obtained by calculating 
the variance of the fluctuations of 1 00 different realizations of the epidemic outbreak for each 7. 

subsequent outbreak may occur in B after a time lag T. Fig. |3 depicts the results of simulations 
for two populations with iV = 10, 000 and p = 4. Various realizations of the time course /a(^) 
and /^(t) of the epidemic in both populations we computed. The initial number of infecteds in 
population was /^(t = 0) = I = 20. The left panel depicts the probability ^(7) of an outbreak 
occuring in population B as a function if the transition rate 7. For large enough rates the proba- 
bility is nearly unity, since a sufficient number of infecteds is introduced to B. For very low rates 
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7 no infecteds are introduced to B during the time span of the epidemic in A and thus ^(7) — > 
as 7 — > 0. For intermediate values of 7 the probability 75(7) is neither one nor unity and the time 
course in population B cannot be predicted with certainty. The function p( 7 ) is given by 



where the critical rate 7* is a function of the parameters po and N. The insets depict histograms 
of the time lag T for those realization for which an outbreak occured in B. Each histogram 
corresponds to a different transition rate 7. The smaller 7 the higher the variability in T. Note that 
even in a range in which ^(7) « 1, the time lag T is still a stochastic quantity with a high degree 
of variance (see also the supplement material). 

Consequently, the introduction of stochastic exchange of infected individuals leads to a lack of 
predictability in the time of onset of the initially uninfected population. In the light of the analysis 
of two populations, the predictability in the case of SARS on the aviation network seems even 
more puzzling. 

The situation changes drastically in networks which exhibits a high degree of variability in the 
rate matrix 7^. Clearly, this is the case for the aviation network. Consider the simple network 
depicted in Fig.|3J Each population contains N individuals. A central population A is coupled to 
a set of M — 1 surrounding populations Bi, ...Bj^-i- Assume that initially a number of infecteds 
Iq is introduced to the central population A such that an outbreak occurs. The entire set of rates 
{7j} j=i,...,M-i determines the behaviour in the surrounding populations. If all rates jj are identical 
and very small we expect no infection to occur in the Bj, for large enough jj an outbreak will occur 
in every Bj. In the aviation network, however, transition rates are distributed on many scales and 
the response of the network to a central outbreak depends on the statistical properties of this 
distribution denoted byg(7). In order to quantify the reaction of the network we introduce for each 
surrounding population a binary number ^ with j = 1, M — 1 which is unity if an outbreak 
occurs in Bj and zero if it doesn't. According to Eq. © for a given rate 7 this quantity is a random 
number with a conditional probability density p(£i|7) = (1 — ^(7)) + p("y) — 1). The 
variability of the network is thence quantified be the cumulative variance per population and we 
define 



as a measure for the uncertainty of the network response. If for example 5(7) = 5(7 — 7), i.e. 
all transition rate are identical and equal to 7, then a (7) = 4p(7)(l — p(j), which is unity for 



p( 7 ) = 1 _ eX p(-7/7*), 



(8) 




(9) 
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p(7) = 1/2. Comparing with Eq. © we see that when 7 = 7* log 2 the system with identical 
transition rates 7i = 7 exhibits the highest degree of unpredictability when the rates are of the 
order of the critical rate defined by ©. The function (7(7) is shown in Fig.|3J 
Now assume that the rate are drawn from a distribution 

<l{l) = ; 7 —, C- 7max < 7 < 7min- (10) 

log(7max/7min) 7 

which implies a high degree of variance within the interval [7 m i n , 7max] (i-e. 7j is distributed uni- 
formly on a logarithmic scale). This high variability in rates drastically changes the predictability 
of the system. Inserting into Eq. © yields (7(7) for strongly distributed rates. In Fig.|3]this func- 
tion is compared to a system of identical transition rates. On one hand, for intermediate values of 
7^7* the predictability is much higher than in the system of identical rates. This is a rather coun- 
terintuitive result. Despite the additional randomness in transition rates, the degree of determinism 
is increased. 



VI. CONTROL STRATEGIES 

Fig.HJexemplify h°\v our model can be employed to predict endangered regions if the origin of 
a future epidemic is located quickly. The figure depict simulations of the global spread of SARS 
at t = 90 days after hypothetical outbreaks in New York and London, respectively. Despite the 
worldwide spread of the epidemic in each case, the degree of infection of each country differs 
considerably, which has important consequences for control strategies. 

Vaccination of a fraction of the population reduces the fraction of susceptibles and thus yields 
a smaller effective reproduction number p. If a sufficiently large fraction is vaccinated, p drops be- 
low 1 and the epidemic becomes extinct. The global aviation network can be employed to estimate 
the fraction of the global population that needs to be vaccinated in order to prevent the epidemic 
from spreading. Fig. HI demonstrates that a quick response to an initial outbreak is necessary if 
global vaccination is to be avoided. The Figure depicts the probability p n (v) of having to vacci- 
nate a fraction v of the population if an infected individual is randomly placed in one of the cities 
and permitted to travel n = 1, 2 or 3 times. For the majority of originating cities the initial spread 
is regionally confined and thus a quick response to an outbreak requires only a vaccination of a 
small fraction of the population. However, if the infected individual travels twice, the expected 
fraction (v) of the population which needs to be vaccinated is considerable (74.58%). For n = 3 



12 





0.25 




V 



Figure 4: Left: Geographical representation of the results two simulations of hypothetical SARS 
outbreaks 90 days after an initial infection in (a) New York and (b) London for the same parame- 
ters and color code as in Fig.|2l Right: Impact and control of epidemics. The probability p n (v) of 
having to vaccinate a fraction v of the population in order to prevent the epidemic from spreading, 
if an initial infected individual is permitted to travel n = 1 (red), 2 (blue), and 3 (green) times. 
The probability p n (y) is estimated by placing the infected individual on a node % (black dot) of the 
network. The fraction vi associated with node i is given by the number of susceptibles in a subnet- 
work which can be reached by the infected individual after n = 1, 2 and 3 steps. Histogramming 
Vi for all nodes i yields an estimate for p n (v). The light-blue curve depicts the strong impact of 
isolating only 2% of the largest cities after an initial outbreak (n = 2) and is to be compared to the 
blue curve. 

global vaccination is necessary. 

As a reaction to a new epidemic outbreak, it might be advantageous to impose travel restrictions 
to inhibit the spread. Here we compare two strategies: (i) the shutdown of individual connections 
and (ii) the isolations of cities. Our simulations show that an isolation of only 2% of the largest 
cities already drastically reduces (v) (with n = 2) from 74.58% to 37.50% (compare the blue and 
light-blue curves in Fig.©. In contrast, a shutdown of the strongest connections in the network is 
not nearly as effective. In order to obtain a similar reduction of (v) the top 27.5% of connections 
would need to be taken off the network. Thus, our analysis shows that a remarkable success is 
guaranteed if the largest cities are isolated as a response to an outbreak. 
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In a globalized world with millions of passengers traveling around the world week by week 
infectious diseases may spread rapidly around the world. We believe that a detailed analysis of the 
aviation network represents a cornerstone for the development of efficient quarantine strategies to 
prevent diseases from spreading. As our model is based on a microscopic description of traveling 
individuals our approach may be considered a reference point for the development and simulation 
of control strategies for future epidemics. 

We thank E. Bodenschatz for critical reading of the manuscript and stimulating discussions. 
This research was supported in part by the National Science Foundation under Grant No. PHY99- 
07949. 
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